The replication crisis has eroded the public’s trust in science. Many famous studies, even published in renowed journals, fail to produce the same results when replicated by other researchers. While this is the outcome of several problems in research, one aspect has gotten critical attention—reproducibility. The term reproducible research refers to studies that contain all materials necessary to reproduce the scientific results by other researchers. This allows other to identify flaws in calculations and improve scientific rigor. In this paper, we show a workflow for reproducible research using the R language and a set of additional packages and tools that simplify a reproducible research procedure.
The scientific database Scopus lists over 73,000 entries for the search term “reproducible research” at the time of writing this document. The importance of making research reproducible was recognized in the early 1950s in multiple research subjects. And with the reproducibility project the Open Science Foundation (Open Science Collaboration and others 2015) found that merely half of all studies conducted in psychological research can be replicated by other researchers. Several factors have contributed to this problem. From a high level perspective, the pressure to publish and the increase in scientific output has lead to a plethora of findings that will not replicate. Both bad research design and (possibly unintentional) bad research practices have increased the amount of papers that hold little to no value.
One problem that is often mentioned is HARKing (Kerr 1998) or “hypothesizing after results are known”. When multiple statistical tests are conducted with a normal alpha-error rate (e.g., \(\alpha = .05\)), it is expected that some tests will reject the null-hypothesis on mere randomness alone. Hence, the error-rate. If researchers now claim that these findings were their initial hypotheses, results will be indiscernible from randomness. However, this is unknown to the reviewer or reader who only hears about the new hypotheses. HARKing produces findings were there are none. It is thus crucial to determine the research hypothesis before collecting (or analyzing) the data.
Another strategy applied (often without ill intent) is p-hacking (Head et al. 2015). This technique is widespread in scientific publications and probably already is shifting consensus in science. p-hacking refers to techniques that alter the data until the desired p-value is reached. Omitting individual outliers, creating different grouping variables, adding or removing control variables—all these techniques can be considered p-hacking. This process also leads to results that will not hold under replication. It is crucial to show what modifications have been performed on data to evaluate the interpretability of p-values.
When researchers already “massage” the data to attain better p-values, it is additionally bad that many researchers do not understand the meaning of p-values. As Colquhoun (2017) found, many research misinterpret p-values and thus frame their findings much stronger than they really are. Adequate reporting of p-values is thus important to the interpretability of results as well.
Lastly, scientific journals have to problem that they are mostly interested in publishing significant results. Thus contradictory “non-findings” seldom get published in renowned journals. There is little “value” for a researcher to publish non-significant findings, as the additional work to write a manuscript for something like arXiv does often not reap the same reward as a journal publication. This so-called publication bias (Simonsohn, Nelson, and Simmons 2014) worsens the crisis. As now only significant findings are available. It is thus necessary to simplify the process of publishing non-significant results.
Many different solutions to this process have been proposed to address these challenges (e.g., (Marwick, Boettiger, and Mullen 2018; Wilson et al. 2017)). However, no uniform process exists that allows creating of documents and alternative reproducibility materials in one workflow.
In this paper, we demonstrate a research workflow based on the R-language and the R Markdown format. This paper was written using this workflow and the sources are freely available online (https://www.osf.io/kcbj5). Our workflow directly addresses the challenge of writing LNCS papers and a companion paper website that includes additional material and downloadable data.
In this paper, we will focus on the following aspects:
rmdtemplates (Calero Valdez 2019)sdcMicro (???)here (???), usethis (???), drake (???))citr (???), gramr (???), questionr (???), esquisse (???))ggstatsplot (???)DiagrammeR (Iannone 2019)Process diagramms as in Figure 4.1 can easily be created using the DiagrammeR (Iannone 2019) Package.
library(DiagrammeR)
grViz(diagram = "
digraph boxes_and_cicrles {
graph [rankdir = LR]
node [shape = box
fontname = Helvetica
]
'Setup OSF Project Site'
Test
node [shape = circle]
Start
edge []
Start->'Setup OSF Project Site';
'Setup OSF Project Site'->Test;
}
")Figure 4.1: Example
Option 1 sdcMicro
Option 2 anonymizer
On this sub-page you can find the data used as a downloadable file (CSV, Excel, or PDF).
We used the following packages to create this document:
Barnier, Julien. 2019. Rmdformats: HTML Output Formats and Templates for ’Rmarkdown’ Documents. https://CRAN.R-project.org/package=rmdformats.
Bryan, Jennifer. 2018. “Excuse Me, Do You Have a Moment to Talk About Version Control?” The American Statistician 72 (1): 20–27.
Calero Valdez, André. 2019. Rmdtemplates: Rmdtemplates - an Opinionated Collection of Rmarkdown Templates. https://github.com/statisticsforsocialscience/rmd_templates.
Chang, Winston. 2019. Webshot: Take Screenshots of Web Pages. https://CRAN.R-project.org/package=webshot.
Colquhoun, David. 2017. “The Reproducibility of Research and the Misinterpretation of P-Values.” Royal Society Open Science 4 (12): 171085.
Gentleman, Robert, and Duncan Temple Lang. 2007. “Statistical Analyses and Reproducible Research.” Journal of Computational and Graphical Statistics 16 (1): 1–23.
Head, Megan L, Luke Holman, Rob Lanfear, Andrew T Kahn, and Michael D Jennions. 2015. “The Extent and Consequences of P-Hacking in Science.” PLoS Biology 13 (3): e1002106.
Iannone, Richard. 2019. DiagrammeR: Graph/Network Visualization. https://github.com/rich-iannone/DiagrammeR.
Kerr, Norbert L. 1998. “HARKing: Hypothesizing After the Results Are Known.” Personality and Social Psychology Review 2 (3): 196–217.
Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” The American Statistician 72 (1): 80–88.
Open Science Collaboration, and others. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.
Revelle, William. 2020. Psych: Procedures for Psychological, Psychometric, and Personality Research. https://CRAN.R-project.org/package=psych.
Simonsohn, Uri, Leif D Nelson, and Joseph P Simmons. 2014. “P-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results.” Perspectives on Psychological Science 9 (6): 666–81.
Wickham, Hadley. 2019. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, and Dana Seidel. 2019. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K Teal. 2017. “Good Enough Practices in Scientific Computing.” PLoS Computational Biology 13 (6): e1005510.
Xie, Yihui. 2020. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.
Zhu, Hao. 2020. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax.